Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compression= in DataFrame.to_json #17634

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

mroeschke
Copy link
Contributor

Description

closes #17564

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 19, 2024
@mroeschke mroeschke self-assigned this Dec 19, 2024
@mroeschke mroeschke requested a review from a team as a code owner December 19, 2024 19:32
@github-actions github-actions bot added the pylibcudf Issues specific to the pylibcudf package label Dec 19, 2024
@@ -54,6 +54,22 @@ def _get_cudf_schema_element_from_dtype(
return lib_type, child_types


def _to_plc_compression(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use something like this in other places? If so, we could reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have something similar we use in parquet, but given that each format supports different compression (and maps slightly differently from Python), I can take a look at this in a follow up

@@ -1453,3 +1453,12 @@ def test_chunked_json_reader():
with cudf.option_context("io.json.low_memory", True):
gdf = cudf.read_json(buf, lines=True)
assert_eq(df, gdf)


@pytest.mark.parametrize("compression", ["gzip", None])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a compression_params you could use

Suggested change
@pytest.mark.parametrize("compression", ["gzip", None])
@pytest.mark.parametrize("compression", compression_params)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tried this originally, but it appears gzip is the only supported compression for writing json

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 550ea35 into rapidsai:branch-25.02 Dec 19, 2024
109 checks passed
@mroeschke mroeschke deleted the enh/to_json/compression branch December 19, 2024 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Python-level support for writing compressed JSON format
2 participants